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Abstract 

In this paper we consider the problem of efficient computation of cross-moments of a vector random variable 
represented by a stochastic context-free grammar. Two types of cross-moments are discussed. The sample 
space of the first one is the set of all derivations of the context-free grammar, and the sample space of the 
second one is the set of all derivations which generate a string belonging to the language of the grammar. In 
the past, this problem has been widely studied, but mainly for the cross-moments of scalar variables and up 
to the second order. This paper presents new algorithms for computing the cross-moments of an arbitrary 
order, and the previously developed ones are derived as special cases. 

Keywords: stochastic context-free grammar, cross-moments, semiring, moment generating function, 
partition function, inside-outside algorithm 



1. Introduction 

The cross-moments of random variables modeled with stochastic context-free grammars(SCFG) are im- 
portant quantities for the SCFG parameter estimation |8]. They are defined as expected value of the 
product of integer powers of the entries of random vector variable, which can represent string or derivation 
length, the number of rule occurrences in derivation or uncertainty associated with the occurring rule. The 
expectation can be taken either with respect to the sample space of all SCFG derivations, or with respect 
to the sample space of all derivations which generate a string belonging to the language of the gram- 
mar. Throughout this paper, the name cross-moments is usually used in the former case, while in the latter 
case we talk about conditional cross-moments. 

The computation of cross-moments may become demanding if the sample space is large. In the past, this 
problem has been widely studied, but mainly for the cross-moments of scalar variables (called simply mo- 
ments) and up to the second order. The first order moments computation, such as expected length of 
derivations and expected string length, are given in I19I1 . The computation of SCFG entropy is considered 
in jl2ll . The procedure for computing the moments of string and derivation length is given in ({J, where 
the explicit formulas for the moments up to the second order are derived. First order conditional cross- 
moments are considered in |9], where the algorithm for conditional SCFG entropy is derived. A more 
general algorithm for computing the conditional cross-moments of a vector variable of the second order is 
derived in 111 ill . 

In this paper we give the recursive formulas for computing the cross-moments and the conditional cross- 
moments of a vector variable of an arbitrary order. The formulas are derived by differentiating the recursive 
equations for the moment generating function which are obtained from the algorithms for computing 
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the partition function of a SCFG [13] for the cross-moments and with the inside algorithm jioh , |f§] for the 
conditional cross-moments. 

The paper is organized as follows. Section 2 introduces multi-index notation, which is used throughout 
the paper, and reviews some preliminary notions about generalized Leibniz's formula, basic algebraic struc- 
tures, and context free grammars. In Section 3 we give the formal definition of SCFG cross-moments and 
moment generating function. The recursive equations for cross-moments are given in Section 4 for the case 
when the sample space is the set of all derivations, while in Section 5 we consider the set of all derivations 
which generate a string belonging to the language of the grammar as the sample space. 



2. Preliminaries 

This section gives some basic definitions and theorems which are used in the paper. We review multi- 
index formulation of the Generalized Leibniz's formula IHHl . and basic notions from the theory of weighted 
context free grammars, according to ] 13] and [ 14]. 

2.1. Generalized Leibniz's formula 

For any multi-index a = {a\, . . . , ai) e Nq, we define its length as the sum |a| = a\ + a% + • • • + a. The 
multi-index factorial is al = ot\\ ■ ■ ■ a^l. The zero multi-index is = (0, . . . , 0). 

If f{ = (/Si, . . .,$£) e N d , we write jf? < a if /3; < a,- for all i = 1, . . . , d. We write /5 a provided |Sj a, for 
i = 1, . . . , d. In such a case, we set a ± jS = {a\ ± . . . , an + fid), and 



(2) 



If /?! + ■•• + P N = a, we define the multinomial coefficients by 
a \ a\ 



N' 

For z = (zj, . . . , Zd) e R d and y = {y\, ...,jd) £ the multi-index power is defined by: 

(3) 

With these settings, the multinomial theorem [4] can be expressed as 



(2»'= E L"..,M (4) 

i=l ft +■+/»*=* Vri r 



forz = {zi,...,z d ) e M d andy = (yi,...,y d ) e Nj*. 

Let v = (vi, . . . , Vd) and let C v denotes the set of functions u : R' 1 — > K which have v-th partial derivative 
at zero. For every u e C a we define the mapping D v : C a —* R with 

Note that Do{ M (r)} = w(0). According to the generalized Leibniz's formula ll^l . the following equality holds 

n a {FG}= £ [ a B ) D A F }- D «-A G \ ^ 

for all F, G e C a . The derivative of the product of more than two functions can be found according to [18] 

m / \ m 

i=l ft+-+/?,„=" V Pl ™ ' '=1 

for all Fi e C v ; i = l,...,m. 
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2.2. Semirings 

A monoid is a triple (K, ©, 0) where ffi is an associative binary operation on the set K, and is the identity 
element for ffi, i.e., a® = Offia = a,foralla e K. A monoid is commutative if the operation ©is commutative. 

Example 2.1. Let E be a non-empty set. The free monoid £* = (E, •, e) over £ is a monoid, where the carrier set 
E* = { a\ . . . a n | n e No, fl; e E (1 < i < n)} is the set of all strings over E and e is the (unique) empty string of 
length zero. The operation • denotes the composition (concatenation) of strings defined by u\ • «2 = W1W2 for 
all u\,U2 e E*. 

A semiring is a tuple (IK, ffi, ®, 0, 1) such that 

1. (K, ffi, 0) is a commutative monoid with as the identity element for ffi, 

2. (K, ®, 1) is a monoid with 1 as the identity element for ®, 

3. <8> distributes over ffi, i.e. {a ffi b) ® c = (a ® c) ffi (b ® c) and c ® (a ffi fr) = (c <g> a) ffi (c ® b), for all a, b, c in K, 

4. is an annihilator for ®, i.e., fl ® = <8> fl = 0, for every a in K. 

A semiring is commutative if the operation <8> is commutative. The operations ffi and <8> are called the addition 
and the multiplication in K. For a topology t we define the topological semiring as a pair (K, t). 

Example 2.2. The semiring of v-continuous functions at zero is the semiring (c v , +, •, 0, l) with the standard 
uniform norm topology, where the sum and the product of functions F,GeC v are defined through the sum 
and the product of real numbers, (F + G)(f) = F(f) + G(f), (F • G)(f) = F(t) • G(f), and the identities 0, 1 e C v 
are the zero and identity functions defined by 0(f) = and 1(f) = 1, for all tel. 

2.3. Weighted and stochastic context-free grammars 

By a weighted context-free grammar (WCFG) over a commutative semiring (k, +, •, 1, 0) we mean a tuple 
G = {L,N,S,K^}, where 

• E = Jo?!, . . . , H7p|J is a finite set of terminals, 

• N = {Ai, . . . , A|jv|} is a finite set of nonterminals disjoint with E, 

• S e N is called the sfarf symbol (throughout the paper it is usually assumed that S = A\), 

• KcJVx(EU N)* is a finite set of rules. A rule (A, a) e K is commonly written as A — > a, where the 
nonterminal A is called the premise. The set of all rules A, — > By, By E (Nu E)* will be denoted by 7?,. 

• ji : R — > K is the function called weight. 

The left-most rewriting relation => associated with G is defined as the set of triples (a, 71, jS) e (E U AO* X 
Rx(EU AO* for which there are a terminal string m € E*, a nonterminal string 5 e (E U A/)*, along with a 
nonterminal A e A and a string y e (E U AO* such that a = wA<5, f} = uyb and n = A — » y is a rule from 

The left-most relation triple (a, n, will be denoted by a ^> jS. The left-most derivation (in the further text the 
derivation) in this grammar is a string 711, . . . , n n £ K* for which there are grammar symbols a,peLL>N such 
that we can derive jS from a by applying the rewriting rules n\, . . . ,n n a =^> • • • =^ /3. The weight function 
is extended to derivations such that fi (7Z1 • • • 7Zjv) = /i(7ii) • • • (i{tin), for all 711 • • • 7iAf e K*. A nonterminal A 
is productive if there exists a derivation n\---Tik such that A ^ • • • ^> m, u € E*. A nonterminal A accessible 
from a nonterminal B if there exist derivations n\ ■ ■ ■ such that B ^ ••• 5 r^A^ where rj, £, e (E U AO* (if A 
is accessible from S, then it is simply accessible). A nonterminal A is useful if it is accessible and productive 
(otherwise, it is useless). 

A weighted context-free grammar G = N ,A\,%p} over the probability semiring (k+,+, •, 0, l) 
is called a stochastic context-free grammar (SCFG) if the weight p maps all rules to the real unit interval 



3 



[0, 1]. A SCFG is reduced if p(A — > y) > for all A — > y e H and each nonterminal A, and all nonterminals 
are useful. In this paper we consider only reduced SCFGs. In addition, we assume that the SCFG is proper, 
which means that the weight function p gives us a probability distribution over the rules that we can apply 
i.e. Ej*] p(Ai -» By) = 1 for all 1 < z < |AT|. 

For a stochastic context-free grammar G = (L, N , A\, % p) we define the subgrammar G^ = (E, A/?, A„ 7?', p' 
with the start symbol A,, where M' is the set which consists of A, and nonterminals accessible from A, and 
ft'- Q ft is the set of rules in which only nonterminals from N' appear as premises and p'{n) = p(n) for each 
n e'R'j. Note that if G is reduced, then G- also has this property. 

3. Moment generating function of SCFG 

be a stochastic context-free grammar, Q the set of all derivations in G and Q, the 
set of all derivations starting at A, e N. The grammar G is consistent if 

7ieQ,' 

for 1 ^ ; |A/]. Booth and Thompson [1] gave the consistency condition for the start symbol S = A\ by the 
following theorem. 

Theorem 3.1. A reduced stochastic context-free grammar G is consistent if p(M) < 1, where p(M) z's the absolute 
value of the largest eigenvalue of the expectation matrix M = [M/,„], 1 < z, n ^ |W| defined by 

m 

M iin = Y j p(A i ^Bi,j)r n (i,j), (8) 

;'=i 

zt>/zere r H (z, /) denotes the number of times the nonterminal A„ appears on the right hand side of the rule n = A, — > By. 

Note that the expectation matrices of all subgrammars G- are the principal submatrices of M, and 
according to @] (Corollary 8.1.20) p(M (,) ) p(M) and G\ are also consistent, i.e., 



£ P(») = L ( 9 ) 

7ie£2,- 

For the vector function X : Q — > M D , we define the i-th moment generating function (MGF) of X, as the 
function M x : SP -> K where 

MjrW = E P( Tr ) £ ' TX(7l) ' (10) 

7ie£2,' 

for all t e R D . The z'-f/*z cross-moment of an order v = {v\, vd) is defined with 

t< v) {x) = £ P(«) ' Xi(ti) v1 ■ • • X D (rrp = £ p(7i) X(7i) v , (11) 



zzz. 



7ie£2; Tied; 



where X(n) = [Xi(n), . . . , X D (n)] . The cross-moment can be retrieved from the MGF by differentiating [ 6]: 
m (v) X = ^— I „ = DJM®}, (12) 
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The i-th conditional moment generating function, M^j >( , and the i-th conditional cross-moment of an order v, 



in 



Mi 



X | mJ, are defined in the similar manner if the summing is performed over the set of all derivations 
starting at A, and ending with a string weE*. 

The direct computation of ((Til l by enumerating all derivations is inefficient, since it requires the 0(|Q|) 
operations, and it even becomes infeasible when Q is an infinite set. On the other hand, if we can derive the 
expressions for efficient computation of the moments generating function iflO]) , the moment can be retrieved 
by differentiation. The expression can be obtained if the random vector X can be represented as the sum of 
random vectors Y : R —> R: 

X(m ■ ■ ■ n N ) = y(tii) + • ■ ■ + y(n N ), (13) 

for all tii---tcn £ O. Then, for G = (e, N, A\, fi, p ), we can construct G = (e, N,A\,%, /.i), the moment 
generating grammar, with the weight function n :'R—> C v defined with 

\i{n) = p{n)e fY{n) (14) 

for all 7i e H. A derivation n = tl\ ■ ■ ■ tin In G with the weight p{n) = p(n\) ■ ■ ■ p(7i;y) is also a derivation in G, 
for which the weight is given with 

f( (7i) = p(m) ■ ■ ■ ii(n N ) = p(7ii)/ Y(7tl) • • • p(7T N )e tTy ^> = p(n)e fTx W. (15) 

The MGF of X can now be can be expressed as the sum of derivation weights in G as 

M x (t) = £ p(rr) e ™ = £ f ((7r). (16) 

7ieQ 7ieQ 



Thus, the problem of MGF computation is reduced to the problem of the partition function computation [13], 
and the conditional MGF can be computed using the inside algorithm ^\ over the semiring of v-continuous 
functions at zero. In the following sections we show how the expressions for the cross-moments and 
conditional cross-moments can be derived from (16). 



4. Cross-moments computation of SCFG 

Let G = (e, N, A\, % pj be a weighted context-free grammar over a commutative semiring (k, +, -, 1, ) 
endowed with a topology x. Assuming that for 1 ^ i ^ \N\ the infinite collections {,u(7t)J are summable 

in t, and that the distributive law for infinite sums holds, we define the partition function Z : N — > K which 
to every nonterminal A, e N associates the sum 

Zi = £ p(n). (17) 

By factoring out the first rewriting of each derivation in the sum, using the distributive law, the partition 
function can be expressed with the system 11311 : 

\N\ 

Z, = £//(4-»By). JJZ*^, (18) 

/=1 /c=l 
where 1 ^ f < \N\. 

Now, let G = (Yi, N , A\, %, be the moment generating grammar for G = (E, N , A\, 'R, p^j with 



p(n) = p(ji)e eY(jt) 
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(19) 



and 

X(m ■ ■ ■ n N ) = r(ni) + ■■■ + Y(n N ). (20) 

According to the discussion made in the section|3l the value of the partition function at the nonterminal Aj 
corresponds to the z'-th moment generating function, Z, = M$, and the z-th cross-moment 

m^{x) = D a {M { £] = £> a |z,} = pWW, (21) 

n€£2; 

and it can be computed by differentiating JTSb and solving the resulting equation. Note that 

mf{x) = Dojz,} = ( £ p(n)e tT ^)\ t=0 = £ p(n) = 1, (22) 



for all 1 ^ i ^ \N\. The cross-moments of higher order can be obtained by applying the generalized Leibniz's 
formula lO to (181 , which leads us to the following system 



<M = E E (b Wm* - • M n z f% w 

j=l p^a \ ^ I k=1 



where, 



D tt - P {ii(A, -» By)) = p(A ; -» B/, y ) ■ y(a ; -» By) , (24) 
since fi(7i) = p(n)e tT ' Y , for n ^R. According to the generalized Leibniz's rule 10, we have 



and 

o r ,iz*'')=o y ,(n"z t )= £ ( 1- jrK'W m 

;=i 6l+ ... +6rjt(V=n \ o i'---'^(B„,)/ i=a 

By substituting (|26l l and |25) in ll23l l, we obtain: 

IK, I 

^W = EE^M, (27) 

7=1 /?^« 

where 



y IP n y y * rufW (28) 



To solve the system l|27t , we split it into two parts: one depending and the other not depending on /rz^'jxj: 

m f ] { x ] = E &>'{*'*) + E E q^M- ( 29 ) 
7=1 7=1 /?<« 



where 

Q hj (a, a) = p(Ai -> By) • W uj (a) (30) 

and 

Further, if we set 



rJ = L % )ff E L YjlK'W 02) 

\Yv-> Y\n\ I ij s,+...Jr:„ <=v . V °i' • • •' °»(b,,,) / ,=T 



the expression for lA^By) can be rewritten as: 

m 

Wy(a) = £ H|J( ri + ^ H i4 {y x yj, (33) 



«=1 Yi+-+Y\N\= a 

Yi r wl «* 



where H^yy v y^) stands for Hi : j(y v ... , y^) with y n = a and all another y-s equal zero, which is, 
according to I 



N\ r * ( V 



6 , n»fw nn»ri4 <w 

_ r »t%> ' (=i *=i ;=i 



Finally, after use of wxj: '|X| = 1 we obtain 



rn(Bj ; .) 

«;?(>-■ r M )= e ( 6j \ ( ,,)n<"i x i < 35 > 

6 1+ - + 6 r „ (V = a ^ 1 "--' '"( B ^ / ;=i 
which can be rewritten using the same procedure as 

rJ-E-FM* E L \ tB) )U'<'V\- 

c =1 A,4-...^ ,„ ,=„ \ " l ' ■ ■ ■ ' U 'n(Bi,j) I ,, 



6l/-/6r n (B i .)<'* 



=,.a W .»fw + e U,.. :*,«,,) n -f»w- wi 

-I u/i £t ' /=1 



6 1 ,...,6 r „ i B l -)<a 



By substituting ll36l l in l l33l it follows that: 



Wi, j (*) = Y j rn(B i ,i)-ini a) [x} + 

n=l 

E E U \„, n<"M+ E n«> C») 



«=1 6i+-+6,„ (B . ;) =a v 7 /=i yi+-+y| M = a 

6i 6 r „ ( B,,)<a 71 y W <a 



Further, by substituting ll37t and J30l> in l l29l l, the moment can be expressed with: 

m ( : 



a) {x) = £ p(Ai -» By) £ rn(B M ) ■ ni a) [x] + 

7=1 «=1 
l«fl \N\ / v r " ( V 

7=1 «=1 8 1 +-+6 r „ (B . J .,=a ^ lx ' / ;=i 

*1 6 r»(B,,,) <a 

IK, I 



£p(A,-»By) £ Hy(y 1 ,...,y | „ | ) + ££Qy(* / jS), (38) 

7=1 yi+-+ywi=° 7=1 0<a 

ft riM <a 

where H; /y^ . . . , y^) and Q;,jU*/ /?) are given with l|32ll and j28l l. Finally, if we introduce 

;=i «=i S 1+ -+6 r „( V =a V U1 ' ' " r *( B '"> 7 /=1 

6i,-A„(s ij )<a 

5>^->By) ^ Hy( yi y M ) + £X^'( a '4 < 39 > 

7=1 yi+- + 7wi=° 7=1 jS<a 

yi yuv! <a 

the equation | |38|| can be more compactly written as: 

lAfl 

m^{x) = Y J ^,n-m ( : ) {x) + cf\ (40) 

n=i 

or in matrix form: 

m (a) = M • m (a) + c (a) , (41) 

where m^ = [m^jxj, . . . , jxjj is the cross-moment vector, = \c^\ ... , c^J and M is the momentum 
matrix defined in Theorem l3.ll Since we assume that the condition p(M) < 1 given in Theorem l3.1l is satisfied, 
I - M is invertible, and the matrix equation has a unique solution given with 

m {a) = (i - M) _1 c (a) . (42) 

Provided that the we have computed the inverse (l — Mj , which does not depends on a, the cross-moment 
is completely determined by the term c- a \ which depends on all cross-moments of order lower than a and 
can be computed using d39l l. In the following section, we derive for scalar random variables up to the 
second order, and retrieve the previous results for the first and second order moments |0] as a special 
case of the equation I 



4.1. First order moments 

In the case of the first order moments a = 1 0, and the expression ((21) reduces to the expectation of X 
with respect to the sample space Q„ 



(i) 
m. 



jx} = £ p(n)X(n). (43) 



7IEQ,- 



One dimensional multi-indexes are written without parentheses. 
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The moment vector, m^' = [m^'jxj, . . . , m^ix}, is computed as in the equation I 

m (1) = (i - M) _1 c (1) , (44) 

where c^ 1 ' = \cf\ . . -/c[^| • The first and second sum in the expression d39l l for c| a) reduce to zero and 
cf ' = L'=| Q;,;'(l, 0), or, after use of the expression ||28) for Qy 

cf ] = £ p(A ^ By) • y(a, -» By). (45) 

7=1 

Let, TL\ - ■ ■ ttjv be a derivation starting at the start symbol A\ and ending with a string u £ E*. If we 
set Y^A, — > By) = 1, according to l|20) , we have X(ni ■ ■ • tin) = LnLi M 71 ") ~ i- e ' ^ i s ^e length of 
the derivation. According to the expression 14311 , the moment m^'jxj is expected derivation length which 
agrees with [ 1 1 and 

Similarly, if we set Y^A, — > By) = W*/ /)/ where i„(z, /) denotes the number of terminals in the string 
By, the variable X(tzi ■ ■ • tijv) reduces to the length of word derived from n\ ■ ■ ■ tin. In this case, the moment 
wz^'jx} reduces to the expected string length and the formula j45|l reduces to the result from 

4.2. Second order moments 

The formula for the second order moments is a somewhat more complicated. In the case when a = 2, 
we have that cj a ' is reduced to: 



7=1 «=1 6i+-+6 rB(B . p=a V 17 ' /=1 

6l,...A„(B, :)<2 



J^p^^By) £ H w(yi y\ R i) + J^Qi.j( 2 >°) + Y J Qi,j( 2 > 1 )> ( 46 ) 

7=1 yi+"-+7|R|=2 j=l /=1 

yi,-,yiK|<2 

The first sum in the previous expression can be transformed to: 

7=1 «=1 6 1 +-+S r „ (B ..)=2 V 17 7 / l=l 

&i 6 m(.B itj )<2 

m w 2 

£ p{Ai -> By) £ r n (By)(r„(By) - l) • jxj (47) 

/=1 n=l 

To compute the second sum we introduce ttf'Pyy v Yy^), which is Hy(y 17 . . . , y^) with y a = y b = 1 and 
with all another y-s equals to zero. We have: 



S 1 +-+a r „ (Bu) =y/ 0l7 --" '"^ / t=l 



8l+-+6^( B ..,= yi V ° 1 ' ■•' U n(.B,, l ) I (=1 fcl 6l+ ... +6 \ "1/ • / n-(B,, ; ) / , =1 



and 

Ht*\n r m ) = 2 ■ Z ™? W • L = 2 ■ ■ *W ■ ( 4 9) 

c=l rf=l 

By substituting the second sum in d46t 

iRii m m \n\ 

E P(Ai - By) ^ Hy(y, y, J = £ p(A, -» By) £ £ H<f ( 7l y J = 

7=1 )'iH — H-yjjv|=2 7=1 a=l b=a+\ 

y\,-,y\N\<L 

\KA \N\ \N\ 

= 2 • £ p(Ai -» By) ^ Yj r "( B w) • • m { ?{x}mf{x} = 

7=1 «=1 b=a+l 

m \N\ \N\ \KA \N\ 

= £ p(A ; -» By) £ £ ^(By) • r b(Bi,j) ■ mf^mf^x] - £ P(A -» By) £ r B (By) 2 m«{x) (50) 

7=1 »=1 6=1 ;=1 n=l 

Now, j46t reduces to 

l«.i IKil 

tf = CR, + £ Qy(2, 0) + E Qw( 2 ' 1) (51) 

;=1 /=1 

where 

I Af| | ATI |A/| 2 

CR, = £ "» By) £ £ r 8 (By) • r b (By) • m^xjm^x) - £ p(A t -> By) £ r„(B w )m<?{x) (52) 

7=1 n=l 6=1 7=1 «=1 

and 

Qy(2, 0) = p(Ai -» By) • y(a ; -» By) 2 , (53) 

lAfl r„(By) \N\ 
Qy(2,l) = 2-p(A, -» By)-Y(A, -» By)£ 2 m^jx) = 2-p(A, -» By)-Y(A, -» By)£r n (Byrf{x}. (54) 

n=l 17=1 n=l 

By setting Y^ A, — > By) = 1 for all A, — > By e X becomes derivation length. The formula for computing 
the second order moments of derivation length is given in |8] and it can be derived from the equation <f5lT > 
since 

£ Qw(2, 0) = £ p(A, -» By) • y(a, -» By) 2 = £ p(A, -» By) = 1, (55) 

7=1 7=1 7=1 

|«,| |JV| | AT| 

£ Qy(2, 1) = 2 • £ (£ p(A, -> By) • r n (By))m^'|x} = 2 • £ e^m^x} = 2 ■ m^fx) - 2, (56) 

7=1 n=\ 7=1 n=l 

where the last equation follows from | |42)| , and 

c| 2) =CR, +2- m^jxj-1. (57) 
Finally, by substituting ll57t in l l4"2t we obtain 

m (a) jx( = (i - M)" 1 • (CR, + 2 • mi - l), (58) 

where CR, = [CRi, . . . , CR|jv|] and 1 = [l, . . . , ll, in agreement with (jj. 
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5. Conditional cross-moments computation for SCFGs 

Let G = (y,, N ,A\, 'R, [A be a weighted context-free grammar over a commutative semiring HK, +, ■, 1, 0), 
and Q,(m) be a set of all derivations which derive ueE' starting from A,. The inside weight of the weighted 
grammar G is the function o, ■ : N X E* — > C v , defined as the sum of weights of all derivations starting with 
A; and ending with u, 

°{ u )= E (59) 

7ieD,(«) 

for 1 < j < \n\ and ueE*. Let A, -> By e 'R and 

By = i^A^A,, • • • VkAi k Vk+i, (60) 

where r>; e E* and A^ e N. For the cycle-free reduced grammars the inside weight can be computed using 
the inside algorithm [5] and [17] which after recursive application of 

Iftil k 

= £ E ^->By)-n ff i/(";) ( 61 ) 

y=i «i,«2,...,«it£E* /=i 

M=t)lMlP2-I'ir«tf|f + l 

ends with the equation in which only rules A, -4«,ueE' appears on the right hand side: 

ffj(tt) = (lifA; -> m). (62) 
Now, let G = (L, N, A\, H, /.i) be the moment generating grammar for G = (£, A/", Ai, 'R, p) with 

fi(7i) = p{n)e tTY(n) (63) 

and 

X(7ii, . . . , n N ) = y(th) + ■ • • + y(ti n ). (64) 

According to the discussion in Section|3j the value of the partition function at the nonterminal A, corresponds 
to the z'-th moment generating function, cr, = M®. u , and the z-th cross-moment 

m^[x\u} = D a [M% u } = D a [a,(u)}= £ p{n)X(n)*, (65) 

7I€£Jj(«) 

and it can be computed by differentiating d6lT > and solving the resulting equation: 

£> a (a;(«)} = £ £ £l" \D a - P {p(Aj^B irj )}-Dp{Yla lt (uj)}. (66) 

/=1 u\,ui,. ..,u^eL fi^a * ' ' j=l 

By use of the generalized Leibniz's rule © we have 

lAfl / v I ft \ k 

d-W«)}=E E E gN^/^M' E L v n^HM' ( 67 > 

/=1 «i / « 2 ,...,«fc€E p^a\ F I y 1 +--y k =f} \ 'V " ' ' ik / j=l 

and after use of the equality 

IV^fA, -» By)} = p(A; -> By) • y(A; -» By)"^, (68) 
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the recursive equation becomes 



j=l Hi,W2,-.-,W^£^ * ' ' 



yi+-n=0 7=1 

The base case is derived by differentiating | [62l : 

m| y) jx | wj = £> y {a(A„ «)) = £> r {fi(A; -» «)} = p(A, -» «) • y(a, -» u) y . (70) 

The previously developed algorithms for the cross-moments computation generalize the algorithms by 
Li and Eisner 111 ill , for the cross-moments of order a = (1,1). Li and Eisner introduced the second order 
entropy semiring, and ran the inside algorithm on it, obtaining the recursive formulas which are the special 
case of (|69l l- l(70l l. The algorithm for the moments of order a = 1 is provided by Hwa [9], where conditional 
entropy is considered. As noted in [2], Hwa's algorithm can be obtained by running the inside algorithm 
over the first order entropy semiring |3|]. In the following subsection we introduce the binomial semiring 
of an order a, which is the generalization of the first and second order entropy semirings. After that we run 
the inside algorithm over the binomial semiring and show that this approach yields the equations (I69l - ll70l . 

5.1. Conditional cross-moments computation using the inside algorithm over the binomial semiring 

Let v = (vi, . . .,Vrf) e be a multi-index and |v| = V\ + ••* + and N v be the cardinality of the set 
{a ^ v\. We define the map <p : C v — » R Nv , which to every f e C v associates the tuple /, indexed by a 
multi-index a ^ v, such that 

Conversely, we define the map q> : R N " — > C v , which to for every/ = (f a ) a ^v £ R Nv associates the polynomial 
function 

/ = <Rf); = foralHeM ' ( 72 ) 

having the property = f a and cp^fj = {f a )a^v = /• Hereinafter, for the pair (/,/) we always assume 

u = (p(f) and / = <£(/), for all / e M. Nv and feC v . 

For all/, g e R Nv , let the operations © and <g> be defined with / © g = (p(f + g), f ® g = (p(f • g). After use 
of the generalized Leibniz's product rule QJ, we have 

/ © g = cb{f + g) = (D a [f) + £>*M) a<v = (f« + #«) a<v < 73 ) 

f * 9 = cpif • g) = ( £ I * ) 2)„{/) ■ C^) ) = ( E ( p U • W ^ 

By map cp, the zero function is mapped to the tuple = \D a ) , which is at all ct-positions, while the 

identity function is mapped to 1 = (©a jlj) ^ , which is 1 at the position v = 0, since £>o{ ■ } is identity map 
and at all other a-positions: 

6 = (0,0,...,0), 1 = (1,0,...,0). (75) 



N v times N v times 
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With these settings, the tuple (m Nv ,©,®, 0, l) is a commutative semiring, which we call the binomial 
semiring of an order v. 

Example 5.1. The first order entropy semiring |0] is a special case of the binomial semiring for a = 1. In 
this case the binomial semiring reduces to (r 2 , ©, ®, 0, l), with = (0, 0), 1 = (1, 0) and 

f®g = (fo + g , fx+gx), (76) 
f® q = ( f ■ §o , f ■ g x + fx ■ g ), (77) 

where the first component corresponds to a = and the second one to a = 1, in agreement with @|. 



Example 5.2. The second order entropy semiring [11] is a special case of the binomial semiring for a = (1,1). 
In this case the binomial semiring reduces to (R 4 , ©, ®, 0, l), with = (0, 0, 0, 0), 1 = (1, 0, 0, 0) and 

f®g = ( foo + goo , fox + gox , fxo + gxo , fxx + gix ), (78) 
/ ® g = ( foo ■ goo , foo ■ goi + fox • goo , foo ■ gio + fxo ■ §oo , foo ■ gix + fox ■ §\o + fxo ■ goi + fxi ■ goo )/ (79) 

where the first component corresponds to a = (0, 0), the second to a = (0, 0), the third to a = (0, 0), and the 
fourth to a = (0, 0), which agrees with [11]. 

By induction and by use of (7), we can obtain the following equations which hold for arbitrary tuples 

U ( 1 ),...,M (N) forM |v| : 

N N N / \ N 

«g>n=*.(n« w )=«-in « w != i L".Jn# « 

n=\ n=X n=l p t +-+p N =v v rl ' ^ N > h=1 

Accordingly, by setting 

li(n n ) = ( M"») ) a<v = ( p(n n )Y(n n r ) a<v (81) 
for each derivation nx ■ ■ ■ tin <= Q we have 

N , s N / \ N 

i=X ft+-+/? N =a \r\> ™ ' n=X /? 1 + +/? N =a V rl' "N I n= x 

N I x N N 

(l[p(nnj) Yj [r " „ )-llnn„f» =p(n)-(Y J Y(n ri )) a = p(n)-X(n) a . (82) 

n=l (S 1 ++(S N =a\ pl ''"'P N ' n=l n=l 

In addition, 

N N N N N 

( © = M E " (H) ) = M E = E D ¥ n) ] = E & ^ 

ii=l 17=1 17=1 n=X n=l 

and the inside weight is 

N 

4 u )= © K k )= © (8) = ( E p^- x ^U- < 84 > 

TreDitM) 7ieD,(n) n=l neD^u) 
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for 1 ^ i < \N\ and u e E*. Let A, — » By e 7? and By = PiA^A;,, • • • v^A^v^i, where v t e Z* and A, n e A/". 
The inside recursive equations in the entropy semiring have the form: 



d<x) 



(<*(«)) =(0 [i{Ai — > By) ® (^) ct; ; (m/)) = 

/=1 Wl,W2,...,Hfc££* 7=1 

£ £ (^-»Bi,y)®(8)s ("/))"= < 85 ) 

;'=1 «i / »2,...,»(ieE* ;'=1 

E E E(«)^- B ^-(®sM w ( 86 ) 

7=1 »i,« 2 ,...,« ifc ez;* /3^a ' " ' ;=1 

and after substituting p(A; -> By)^' = p(A, -> By) ■ y(a, -> By)" , and the equation ® for the product 
of fc terms in the binomial semiring, we get the recursive equation 

IK, I 



7=1 »i,M2,— ,«*e£ 

H=Cl«lD 2 -«|tfitttifc + l 



with the base case: 

(tTi(*)) M = ^A; -» m) = p(A ; -» m) • y(A, -» M ) y , (88) 
which is the same recursive procedure as l[69}-l[70). 

5.2. First order conditional moments 

In the case of first order conditional moments a — 1, the conditional cross-moment <|65j> is the expectation 
of X: 

mf ) {X|«}= Yj P(«)X(«). (89) 

7ieQi(«) 

In this case, the recursive equations |[69l - ll70ll reduce to 

|N| 



| X I«1 = E E p{Ai^B itj )-llm^{x\u,), (90) 



7=1 1/1,1(2,.. .,Uk^ 7=1 

H=t,lHll> 2 -H )t t>( r « )t+ l 



|N| 



?{* i A = E E K A < - • ^ - • n »H* 1 4+ 

7=1 

p(A, -» By) • £ mfjx I M „} n m< 0) (x | «,•}, (91) 



7=1 i/i,M 2 ,...,%eL 7=1 



n=l /=1 
7#n 
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with the base case: 

mf 5 (X | u] = p(a, u), mf ) {x\u\ = p(A i ^u)-Y(A l ^u). (92) 

In 0], Hwa considered the conditional entropy of the grammar given in Chomsky form for which 
Bij = ViA^VjAi^ and v\,V2,v^ are equal to the empty string. The conditional entropy is obtained as the 

moment m^jX | mJ, where X(n) = — logp(7r), for all n e Qi and Hwa's algorithm can be retrieved by 
imposing Chomsky form condition in 11691 -dTOll, with Y(m) = - logp(Tt,)- 

6. Conclusion 

In this paper we considered the problem of computing the cross-moments and the conditional cross- 
moments of a vector variable represented by a stochastic context-free grammar. The recursive formulas for 
cross-moments of arbitrary order are obtained, and the previously developed formulas for moments [8], [19] 
and conditional cross-moments [11] are derived as special cases. 
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